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Directional data arise in various contexts such as oceanography 
(wave directions) and meteorology (wind directions), as well as with 
measurements on a periodic scale (weekdays, hours, etc.). Our con- 
tribution is to introduce a model-based approach to handle periodic 
data in the case of measurements taken at spatial locations, antici- 
pating structured dependence between these measurements. We for- 
mulate a wrapped Gaussian spatial process model for this setting, 
induced from a customary linear Gaussian process. 

We build a hierarchical model to handle this situation and show 
that the fitting of such a model is possible using standard Markov 
chain Monte Carlo methods. Our approach enables spatial interpola- 
tion (and can accommodate measurement error). We illustrate with a 
set of wave direction data from the Adriatic coast of Italy, generated 
through a complex computer model. 

1. Introduction. Directional or angular data arise, for instance, in ocea- 
nography (wave directions) , meteorology (wind directions) and biology (study 
of animal movement). They also arise from periodic data, for example, event 
times might be wrapped to a weekly period to give a circular view (elimi- 
nating end effects) of the pattern of event times. Here, we assume the data 
is recorded in degrees or angles on a circle. This is not a limitation, as 
any circular scale (e.g., [0,L) or [— L/2,L/2)) can be transformed to [0, 27r) 
by a modulus transformation. Handling such data creates difficulties due 
to the restriction of support to the unit circle, [0,27r), and to the sensitiv- 
ity of descriptive and inferential results to the starting point on the circle. 
Hence, analysis of directional data is more challenging than for linear data. 
There exists a substantial literature on circular data [see, e.g., Mardia (1972), 
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Mardia and Jupp (2000), Jammalamadaka and SenGupta (2001) or Fisher 
(1993)], but, broadly, it is confined to descriptive statistics and limited in- 
ference for simple univariate models. 

The contribution of this paper is to introduce a fully model-based ap- 
proach, that is, Bayesian hierarchical modeling, to handle angular data, en- 
abling full inference regarding ah model parameters and prediction under the 
model. Our focus is on multivariate directional observations arising as an- 
gular data measurements taken at spatial locations, anticipating structured 
dependence between these measurements. Thus, we formulate an attractive 
spatial process model for directional data, the wrapped Gaussian process, 
induced from a linear (customary) Gaussian process. We illuminate the de- 
pendence structure. We show how to implement kriging of mean directions 
and concentrations in this setting. We work within a hierarchical Bayesian 
framework and show that introduction of suitable latent variables facilitates 
Markov chain Monte Carlo model fitting. We offer an adaptive truncation 
strategy for simulation of these latent variables. 

Directional data has a long history. Early contributors to the theoretical 
development include Watson and Stephens [Watson (1961), Stephens (1963, 
1970)]. Kent (1978) studied complex circular distributions. The books of 
Mardia (1972) and Mardia and Jupp (2000) present approaches, distribution 
theory and inference for such data. In Fisher (1993) we find comprehensive 
discussion, with particular attention to nonparametric methods. Compu- 
tational procedures such as MCMC methods and the EM algorithm have 
enabled analysis for directional data to become less descriptive and more in- 
ferential. Examples include linear models [Harrison and Kanji (1988), Fisher 

(1993) , Fisher and Lee (1992), Kato, Shimizu and Shieh (2008)], linear mod- 
els in a Bayesian context [Guttorp and Lockhart (1988), Damien and Walker 
(1999)] and models for circular time series [Breckling (1989), Coles (1998), 
Mardia and Jupp (2000), Ravindran (2002), Hughes (2007), Fisher and Lee 

(1994) , Holzmann et al. (2006)]. Recently, Kato [Kato (2010)], building upon 
earlier work [Kato, Shimizu and Shieh (2008)], has proposed a discrete time 
Markov process for circular data using the Mobius circle transformation, 
connecting it with an early Markov process model of Fisher and Lee (1994). 
We offer a process model for locations in d-dimensional space but focus on 
the 2-dimensional case. 

There is little in the way of formal multivariate theory for circular data, 
particularly in the fully Bayesian setting. In this regard, perhaps the work of 
Coles (1998) is the closest to ours. He also employs wrapped distributions, 
noting that, in the Gaussian case, they can be readily given a multivariate 
extension. Coles mostly works with independent replicates of multivariate 
circular data in low dimension with an unknown covariance matrix and de- 
velops some theory and examples for the time series setting. However, he 
mentions possible extensions to the spatial setting but offers no development. 
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in particular, no thoughts on regression or kriging (Sections 3.5 and 3.6 be- 
low). Casson and Coles (1998) include spatial dependence in looking at the 
direction of maximum wind speed. With little detail, they propose condition- 
ally independent directions modeled with a von Mises distribution, introduc- 
ing spatial structure in the modal direction and concentration parameters. 
We introduce spatial structure directly on the angular variables, with a sin- 
gle, potentially high-dimensional multivariate observation but driven by a 
spatial process model, yielding a high-dimensional covariance matrix with 
structured dependence as a function of perhaps two or three parameters. 

Our motivating example is drawn from marine data. Often, such data 
are based on outputs from deterministic models, usually climatic forecasts 
computed at several spatial and temporal resolutions. Wave heights and 
outgoing wave directions, the latter being measured in degrees relative to a 
fixed orientation, are the main outputs of marine forecasts. Numerical mod- 
els for weather and marine forecasts need statistical post-processing; wave 
heights, like wind speed, being linear variables, can be treated in several 
ways [Kalnay (2002), Wilks (2006), Jona Lasinio et al. (2007)]. Wave di- 
rections, being angular variables, cannot be treated according to standard 
post-processing techniques [see Engel and Ebert (2007), Bao et al. (2010) 
and references therein]. In Bao et al. (2010) bias correction and ensemble 
calibration forecasts of surface wind direction are proposed. The authors use 
circular-circular regression as in Kato, Shimizu and Shieh (2008) for bias 
correction and Bayesian model averaging with the von Mises distribution for 
ensemble calibration. However, their approach does not explicitly account 
for spatial structure. In our setting, wave direction data is viewed differently 
from wind direction data. The former is only available as an angle, while the 
latter is customarily associated with wind speed, emerging as the resultant 
of North-South and East- West wind speed components. 

Eventually, we plan to do joint spatio-temporal modeling of wave height 
and wave direction (linear and circular data), fusing numerical model output 
with observed buoy data. As a first step, here, we take up static spatial 
modeling for the WAve Model (WAM) data (see Section 4.2), deferring the 
joint modeling, dynamics and data fusion for a future paper. 

The format of the paper is as follows. Section 2 reviews wrapped dis- 
tributions for univariate circular data. Section 3 moves on to the wrapped 
Gaussian process. Section 4 takes up the wave direction application and 
Section 5 offers a summary and future directions. 

2. Univariate wrapped distributions. The von Mises (or circular nor- 
mal) distribution [see Mardia (1972), Mardia and Jupp (2000)] is the most 
common specification for the univariate case. It is extensively studied and 
inference techniques are well-developed, but a multivariate extension is still 
a work in progress. A few recent papers [Mardia, Taylor and Subramaniam 



4 



G. JONA-LASINIO, A. GELFAND AND M. JONA-LASINIO 



(2007), Mardia et al. (2008)] report applications of bivariate and trivariate 
von Mises distributions, with procedures that are complex and computation- 
ally intensive. However, for usual spatial settings, multivariate distributions 
of dimension 50, 100 or more arise. 

For the wrapping approach, let y be a random variable on M, henceforth 
referred to as a linear random or unwrapped variable, with probability den- 
sity function g{y) and distribution function G{y). The induced wrapped 
variable {X) of period 27r is given by 

(1) X = ymod27r. 

Evidently, < X < 27r. The associated directional probability density func- 
tion f[x) is obtained by wrapping g{y), via the transformation Y=X + 2X7^, 
around a circle of unit radius, with K being the winding number. It takes 
the form of a doubly infinite sum, 

oo 

(2) f{x)= ^ g{x + 2k-K), Q<x<2-n. 

k = ~OD 

From (2), we see that the joint distribution of {X,K) is g{x + 2/c7r) with 
X G [0, 27r) and G Z = {0, ±1, ±2, . . .}. That is, Y'^{X,K);Y determines 
{X, K) and vice versa. Then, marginalization over k produces (2). From this 
joint distribution the marginal distribution of K is P{K = k) = j^^ g{x -\- 
2k-K) dx. Additionally, K\X = x is such that P{K = k\X = x) = g{x + 2k-K)/ 
Yl^=-oo9i^ ~^ 2ivr) while the conditional distribution of X\K = k is g{x + 
2kir)/ g{x -\- 2k'n) dx. Hence, the wrapped distributions are easy to work 
with, treating K as a. latent variable. In the context of simulation-based 
model fitting, sampling the full conditional distribution for K will be re- 
quired. The end of Section 2.1 provides an automatic truncation approxi- 
mation to facilitate such sampling. 

Expectations under f{x) are generally difficult to calculate; a more con- 
venient variable to work with is the associated complex random variable 
on the unit circle in the complex plane, Z = e^^ . In particular, for integer 
p, E{e^'^^) =ipYip), where ip is the characteristic function of Y [Jammala- 
madaka and SenGupta (2001)]. 

2.1. The wrapped normal distribution. Under (2), the wrapped normal 
distribution arises with g a normal density indexed by parameter consist- 
ing of mean, /i, and variance <t^. In fact, we can envision ii = jl + K^, where 
fl is the mean direction, a parameter on [0,27r) and i^T^ is the associated 
winding number; as above, fi determines p.. We can reparametrize o"^ to 
c = 6"°^ < 1 where c is referred to as the concentration parameter [Jam- 
malamadaka and SenGupta (2001), pages 27-28]. We write the wrapped 
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normal distribution of X as WN(/i,cj^) with the probabihty density func- 
tion, 

(3) f{x) = —^ Yl ^^P|^ -I 0<x<27r. 



2a2 

From above, E{ZP) = f^"" e*P^/(x) dx = e^Pi^-P^^^^ . Thus, we have 
(4) E(Z) = e"'''/2(cos^ + «sin^) 

so that the resultant length of Z is the foregoing concentration parameter c. 
Furthermore, if E cos x = c cos /U = c cos fi and E sin x = c sin fi = c sin /i, with 
/2 the mean direction, as above, then /i = arctan*(£^sinX, iiJcosX).^ 

To implement Markov chain Monte Carlo model fitting with wrapped 
normal distributions, we introduce K as & latent variable (Section 2.2). 
Hence, we will have to sample K^s, one for each location, at each iteration. 
It is difficult to sample over the support {0, ±1, ±2, . . .}. However, it is well 
known that (3) can be approximated with only a few terms. For instance, 
Mardia and Jupp (2000) comment that, for practical purposes, the density 
can be approximated by truncation to G { — 1,0,1} when > 2tt, while 
for o"^ < 27r using only k = gives a reasonable approximation. We can be 
more precise. Suppose we translate from X to X' = (X + vr) mod 27r — vr to 
achieve symmetric support, [— 7r,7r), with corresponding translation of fi to 
n'. Then, suppressing the primes for convenience, with ip denoting the unit 
normal density function. 



OO 



E -A — - — E -A — - — 



k= 



OO 



E 



fc = — OO 

So- 



{{2k+l)^T-^J.)/a 

ip{z) dz. 

{{2k-l)TT-fi)/a 



Careful calculation reveals that, if kjj = 1+ [|^J = —ki (where [aj denotes 
the integer nearest to a rounded toward 0), then {2ku + l)7r — > 3(T and 
{2kL — l)vr — /Li < —3a. As a result, 

°o /•{{2k+l)2T7-iM)/a /.((2A;+l)7r-/^)/cr 

E / f{z)dz> / (p{z)dz 

fc=_^^((2fc-l)7r-M)/<7 j^, J{(2k-l)n~ti)/a 

> j ^{z)dz = 0.997. 



^From Jammalamadaka and SenGupta [(2001), page 13] arctan* (5, C) is formally de- 
fined as arctan(S/C) if C > 0,5' > 0; 7r/2 if C = 0,5* > 0; arctan(S/C) + tt if C < 0; 
arctan(S/C) + 27r if C > 0, 5" < 0; undefined if C = 0, 5 = 0. 
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Expression (5) facilitates MCMC model fitting since it allows us to de- 
termine the number of terms needed for good approximation as a function 
of cr, for example, if o" < 27r/3, then k G { — 1, 0, 1}; if 27r/3 < o" < 47r/3, then 
A: G {-2, -1,0, 1,2}. 

So, K can be large if and only if cr^ can be large. Under a simulation- 
based model fitting the pair will not be well-identified unless we introduce an 
informative prior on cr^. Moreover, when the WN concentration c is small 
(cr^ large), it becomes difficult to discriminate the WN from the uniform 
circular distribution. With simulation experiments generating 1000 samples 
from WN's using several combinations of sample sizes and variance values, 
uniformity tests such as Rayleigh, Kuiper-Watson and Rao fail to discrim- 
inate between the WN and the uniform distribution for cj^ = 3.252 with 
small sample sizes (n = 30), for cx^ = 4.02 when n = 100 and = 7.11 when 
n = 1000. Hence, it will be worthwhile to employ exploratory data analysis, 
for example, through the foregoing tests (available, e.g., in the CircStats 
library of R), with moments estimators for Jl and o"^, in order to assess 
whether to model using a WN. We recall these moments estimators: with 
angular data xi, . . . , rr^ and the foregoing notation, \etC = ^ Yll^=i cosxj and 
S = ^ SILi sinxj. Setting C = ccos fl and S = csin fl, we obtain moments es- 
timators for c and fl as e~"^^'^ = c = V + S'^ and fl = arctan*(S', C). 

2.2. Model fitting within a Bayesian framework. For data {xi, X2, . . . , x„}, 
as suggested above, it is easiest to write the full Bayesian model in terms 
of the joint distribution {(Xi, Ki),i = 1, 2, . . . , n} given /i, with a prior on 

and o"^, that is, as Ili-^ip{{xi + 27r/cj — fj,)/a)[fj,,a'^]. Hence, the posterior 
for this model involves the latent {Ki} (only employed to facilitate model 
fitting) as well as /x and o"^. The -fCj's will be updated in implementing a 
Gibbs sampler, but only the posterior samples of /x (which will provide pos- 
terior samples of fl) and cr^ will be of interest. To update the Ki, since we 
are given fi and o"^, we can use (5) to implement adaptive truncation, that 
is, we can take m = 1 + [|^J and k = {—m, . . . , 0, . . . , m}. Then, 

(p{{xi + 2ki7r - fJ.)/(j) 



Fr{Ki = ki\fi,a,Xi 
(6) 



ki = — m, . . . , 0, . . . , m. 



Thus, we achieve constant maximum approximation error at each iteration. 

The discussion of the previous section helps in prior specification. First, 
as it is customary, we assume fi and cr^ are independent. For /i we would 
adopt a normal distribution, say, something like A^(/io,Co)- Recalling that 
/i = /i + K^, this induces a WN prior on fi but also makes it clear that 
we cannot learn about fi from the Xj's, that is, in (6), we cannot identify 
the ki^s and A;^, hence the /cj's and /U. However, we can learn about fl. 
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Furthermore, due to the conjugacy, we obtain a famihar normal for the fuh 
conditional for /i, that is, N( ^" £,:(^'+2^fe»)+^'Mo ^ ^^V-^ ^ . ^ f^^^^ ^^^^ 

Tier Q ~\~cr (T ~|~ti(Tq 

previous section, we suggest a right truncated inverse Gamma with known 
scale /3o and shape ao and truncation defined according to o"^ and n, as in 
Section 2.1. For example, if the sample size is n = 30, the inverse gamma 
can be right truncated at vr. Then, the full conditional for o"^ will be a 
right truncated inverse gamma with shape parameter + n/2 and scale 
parameter /3o + \ ^[}21=i{^i + 2fcj7r — jj)"^]. With such priors, the ki are still 
updated as in (6). 

The MCMC is straightforward, though convergence for /i and the KiS 
will not be achievable due to the identifiability problem. However, we can 
perform usual convergence diagnostics on the /U + 27r/^j on /z and on a^'s, 
all of which are well identified. Then, the posterior samples of o"^ inform 
about posterior features for cr^, and hence for c. The posterior samples of /i 
yield posterior samples of /i; for the latter, we can adopt whatever posterior 
centrality summary we wish. Attractively, we can directly create a 1 — a 
credible set, that is, a symmetric posterior credible arc [Fisher (1993)]. This 
is merely the arc that contains the central 1 — a proportion of the posterior 
samples. 

3. Wrapped Gaussian processes. Here, we show how a Gaussian process 
model for linear spatial data induces a spatial process model for wrapped 
data. We examine some of the properties of the induced wrapped process, in 
particular, the induced covariance structure. We discuss model fitting for di- 
rectional data obtained at a collection of spatial locations using this process. 
Again, we adopt a hierarchical modeling approach, describing model fitting 
within the Bayesian framework using MCMC. We briefly look at regression 
in the context of wrapped periodic data. Finally, we show how to implement 
Bayesian kriging, that is, spatial prediction, within this framework. As with 
Bayesian kriging in the context of usual Gaussian processes, we are able to 
implement such prediction as a post-model fitting activity. 

3.1. Multivariate wrapped distributions. There is surprisingly little lit- 
erature on multivariate directional data modeling. Bivariate circular distri- 
butions (whence the support is a torus) are discussed in Mardia and Jupp 
(2000), Jammalamadaka and SenGupta (2001). In Kato, Shimizu and Shieh 
(2008) bivariate circular Cauchy distributions are considered in the circular- 
circular regression framework (Section 3.5). In this setting, there is effort to 
define a sensible measure of correlation between such pairs and to test for 
independence for such pairs (Section 3.3). We shall see that things simplify 
when we work with wrapped normal distributions. Our multivariate motiva- 
tion is a setting where the directional data are wave directions at locations 
and there is anticipated spatial dependence between the angular variables. 
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As a general strategy, it is perhaps easiest to obtain a multivariate wrapped 
distribution for, say, X = {Xi,X2, ■ ■ ■ ,Xn) starting with a multivariate lin- 
ear distribution for Y = (Yi,Y2, . . . ,Yn). In particular, suppose Y~5(-), 
where g{-) is a n-variate distribution on M". Usually, g is a family of dis- 
tributions indexed by, say, 0; a convenient choice for g{-) is an n-variate 
normal distribution. Let K = {Ki,K2, . . . , Kn) be such that Y = X + 27rK, 
analogous to the univariate case. Then, the joint distribution of X and K 
is g{x + 27rk) for < Xj < 27r, j = 1, 2, . . . , n and kj G Z, j = 1, 2, . . . , n. The 
marginal distribution of X is an n-fold doubly infinite sum of g{:x- + 27rk) 
over Z". Its form is intractable to work with, even for moderate p. Again, 
we introduce latent Kj^s to facilitate the model fitting [see Coles (1998) in 
this regard and Section 3.3 below]. 

We say that X has a p-variate wrapped normal distribution when g{-;0) 
is a multivariate normal where = (n, 5]), with an n x 1 vector of mean 
directions and S a positive definite matrix. Using standard results, the con- 
ditional distribution of Yj given {Yi,l ^ j} and 6 is immediate, hence, as 
well, the distribution of Xj,Kj given {Xi,KiJ ^ j} and 6. 

3.2. Wrapped spatial Gaussian processes. A Gaussian process on M*^ in- 
duces a wrapped Gaussian process on W^. In particular, the Gaussian process 
(GP) is specified through its finite dimensional distributions which in turn 
induce the finite dimensional distributions for the wrapped process. Hence, 
we are returned to the multivariate wrapped distributional models of the 
previous subsection. In particular, if s S and Y{s) is a GP with mean 

and covariance function, say, (j'^p{s — s';4>), where (j) is a decay pa- 
rameter, then, for locations si, S2, ■ ■ ■ , Sn, X = (X(si), X(s2), . . . , ^(sn)) ~ 
WN(/x,f7^R((/>)), where n = . . . , ^(s„)) and R{(j))ij = p{si - Sj;<j)). In 

the sequel we utilize a stationary, in fact, isotropic covariance function but 
with regard to the model fitting (see below); other choices could be inves- 
tigated similarly. We note that the multivariate wrapped modeling in Coles 
(1998) employs replications to learn about a general S for the multivariate 
model. We do not require replications due to the structured spatial depen- 
dence introduced through the GP. Also, to implement a spatial regression 
model for angular response with linear covariates, it is necessary to intro- 
duce a monotone link function h{-) from to (— 7r,7r) with h{0) = 0, for 
example, h{-) = arctan(-) [Lee (2010)]. In the sequel, we confine ourselves to 
the case where /i(s) = fi. 

3.3. Fitting a wrapped GP model. Model fitting for a wrapped GP within 
a Bayesian framework can be done using MCMC. First, suppose a linear GP 
model of the form Y{si) = p + w{si),i = 1,2, . . . ,n, where 'w{si) is a mean 
GP with covariance function cj"^ p{s — s';(j)). Consider an exponential co- 
variance and a prior on 6 = {p, o"^, (j)) of the form [/^f] [c^] [</>] which is normal. 
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inverse Gamma and uniform, respectively. Because of the well-known identi- 
fiability issue with and (p [Zhang (2004)], it is often best to update them 
as a pair on the log scale using a Metropolis-Hastings step with a bivari- 
ate normal proposal, following the implementation scheme underlying the 
R package spBayes [Finley, Banerjee and Carlin (2007)]. For the wrapped 
GP, the approach follows that of Section 2 by introducing a latent vector of 
K's. Again, the induced wrapped GP provides X ~ WN(;ul, cr^R((/')), where 
R{4')jk = p{sj — Sk',4>) so that the joint distribution of X, K takes the form 
iV(X + 27rK|/_fl,(T2R((/))). From above, we suggest a normal prior for fi, a 
truncated inverse gamma prior for cr^ and, for the decay parameter, we have 
employed a uniform prior with support allowing small ranges up to ranges a 
bit larger than the maximum distance over the region. The full conditionals 
for fi and <t^ are similar to those in Section 2. The full conditional for cj) is 
unpleasant because it is buried in the covariance matrix associated with the 
n-variate normal distribution for X + 27rK. Again, we have found it best to 
update cj^ and (/> as a pair on the log scale as in the linear case. In fact, with 
such joint sampling, very large values of o"^ are rejected in the M-H step, 
so, in fact, we do not need to impose any truncation on the prior for o"^. 

Finally, the full conditionals for the Ki arise from the conditional distri- 
bution oi Yi = Xi + 2Trki\{Yj = Xj -\- 2TTKj,j ^ i};6. The form is analogous 
to (6) with fi and cr^ replaced by the conditional mean and variance //j 
and af which are functions of {Xj,j / i}, {Kj,j / i} and 6. The adaptive 
approximation of Section 2.1 can be employed. 

Moments estimates associated with the wrapped Gaussian process are use- 
ful for the same two purposes as in the independence setting. One is to help 
specify priors to facilitate inference stability, following the discussion of Sec- 
tion 2. The other is to provide a sensible range for starting values to begin the 
MCMC model fitting. Under normality, again, ^^(e''''-^'*)) = exp(z/i — o"^/2). 
So, moment estimators for mean direction, uncertainty and concentration 
are as in the independence case. 

3.4. Induced correlation for the circular variables. We have defined the 
wrapped GP in terms of the linear GP which has a covariance or corre- 
lation structure (and the parameters which specify them). For a bivariate 
circular variable there is no unique choice of correlation measure. However, 
in Jammalamadaka and SenGupta [(2001), Chapter 8], we find considerable 
discussion of suitable measures for the correlation between two circular vari- 
ables. Jammalamadaka and Sarma (1988) propose a measure which satisfies 
many of the properties of the product moment correlation coefficient. For 
the wrapped bivariate normal for variables Xi,X2 with covariance matrix 

2 2 

(^2 ) it simplifies to pc{Xi,X2) = sinh(/3(T^)/sinh((T^). Hence, with a 
valid covariance function a'^p{s,s'), the induced correlation function for the 
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distance 



Fig. 1. Exponential spatial correlation function (solid circle = 0.01, solid triangle 
4> — 0.Q2, solid square (j!> = 0.05j and the corresponding wrapped correlation (empty cir- 
cle <j) — 0.01, empty triangle 4> — 0.02, empty square 4> — 0.05) for 3 values of the decay 
parameter (f>. 



wrapped Gaussian process is Pc{s, s') = '^'"gjnhfi^f • Fig^^^^ 1 provides a pic- 
ture of the exponential correlation function for various choices of the decay 
parameter (p and the corresponding correlation for the wrapped process. For 
a given distance, association is similar but a bit weaker under the latter. 

As an aside, though we don't build any covariance matrices using sinh(/?(ti)), 
we note that it is a valid covariance function on if a"^ p{u) is. This is 
evident since sinh(i') = Yl'^=i (2n+i)! • Because p{u) is a valid correlation 
function, hence a characteristic function by Bochner's theorem, fP{u) is a 
characteristic function for any integer p > 0. So, sinh(p(it)) is a mixture 
(with suitable normalization) of characteristic functions, hence a character- 
istic function, therefore, again, by Bochner's theorem, a valid covariance 
function itself. 



3.5. Circular- circular regression with WN models. For general bivari- 
ate directional data models, say, g{Xi,X2)., circular-circular regression for, 
say, Xi given X2, is discussed in Jammalamadaka and Sarma (1993) using 
trigonometric polynomial approximations. In the case of wrapped bivariate 
normal distributions, we can obtain the regression explicitly, avoiding ap- 
proximation. We also note that Kato, Shimizu and Shieh (2008) consider 
circular-circular regression curves obtained under a Mobius circle transfer- 
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mation [See also Downs and Mardia (2002) in this regard]. Incidentally, 
they discuss a bivariate circular distribution which is specified through the 
product of a conditional times a marginal wrapped Cauchy distribution. 
We return to the wrapped bivariate normal for variables Xi,X2 with mean 

2 2 

fii , /i2 and covariance matrix ( ^2 ^^2 ) • The regression model we seek is 

E{e'^''\Xi;e) in the form c{Xi;9)e'f^-'^^'^\ where = {fii, fi2,cr'^ , p)- Then, 
we can interpret, respectively, p,{Xi;6) as the conditional mean direction 
and c(Xi]6) as the conditional concentration for X2 given Xi. 
For the associated unwrapped normal distribution, 

Y2\Yr,e^Nif,2\iiYi;e),al^,{9)), 

where /i2|i(^i; ^) = p2+ pO^i — pi) and a'^^^-^iO) = (t^(1 — p^). Hence, writing 
Yx = Xi + 2ttKi, we have XajXi, Ki; 6> ~ WN(/i2|i(Xi + 27rKi); 61), cj^ ^(6>)), 
where p{Y) = ;u(y) mod 27r and, therefore, 

E{e'''^\Xi,Ki-e) = exp(-cj2|i(0)/2 + i~p2\i{Xi + 2^Ki; 0)). 

Next, we have ^(e*^^ 0) = ^Xi|Xi;0^(e*'^' \Xi,Ki;e), where the con- 
ditional distribution for K\X,6 under the wrapped normal is discussed in 
Section 2. So, let p{k;Xi,G) = P{Ki = k\Xi,e). Then, 

ii;(e'^^|Xi;0)=exp(-4i(0)/2) 

00 

X p{k;Xi,e)eMiP2\i{Xi + 2TTk;d)). 

fc=— 00 

We see that c(Xi; 0) = e^^'n^^^/^ exp(i/i(Xi; 0)) = Efel-ooP(^; ^1, ^) x 
exp(i/i2|i(^i+27rfc;0)). So, cos(/x(Xi; 0)) = X]^_ooP(^; -'^i. ^) cos(/i2|i(^i + 
27rk; 9)) and sm{p{Xi;9)) = E^=_^pik; Xi,9) sm{fl2\i{Xi+2Trk; 9)). Mak- 
ing the usual inversion, 

(8) p{Xi ; 9) = arctan* {sm{fi{Xi -,9)), cos(;u(Xi ; 0))) . 

In practice, we would compute p{Xi;9) by appropriate truncation of K. If 
we fit the bivariate wrapped normal distribution with data (Xii,X2i),i = 
l,2,...,n, using MCMC, posterior samples for 9 enable posterior samples 
for fj,(Xi;9) and c{Xi;9) at any Xi. 

3.6. Kriging with wrapped GP models. Kriging is a customary activity 
with spatial data. In this context, we would have, as observations, X = 
{X{si), X{s2), ■ ■ ■ ,X{sp)) and we would seek to predict X{so) at a new lo- 
cation So- In fact, we shall argue that this is a straightforward post-model fit- 
ting exercise and can be implemented following the ideas of circular-circular 
regression for the wrapped normal from the previous subsection. 
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Suppose, for the linear "observations," Y = {Y{si),Y{s2), ■ ■ ■ , Y{sp)) along 
with Y{s()) we have the joint distribution 



U(«o)J=^lU(.o)J'" Vp^v(^) 1 

Evidently, we can obtain the distribution for Y{sq)\^,0, hence the wrapped 
normal distribution for X(so)|X,K,0 and, thus, E(e*'^(*o)|X, K, 0), follow- 
ing the previous section. Now, for ii^(e*"^^*°^ |X, 6) we would need to marginal- 
ize over the distribution of K|X, 0. This requires a p-fold sum over a mul- 
tivariate discrete distribution, hopeless for large p even with considerable 
truncation, for example, a sum over 3^ terms if each Ki is allowed only 3 
values. 

In fact, within the Bayesian modeling framework we seek E^e^-^^^^^l^). 
Fitting the spatial wrapped GP model as in Section 3.4 will yield poste- 
rior samples, say, (0^, K^), 6 = 1, 2, . . . , 5. Then, as usual, £;(e'-^('o)|X) = 
-^K,6>|x-E'(e*"^''*°'*|X,K,0) and so a Monte Carlo integration yields 

(10) i5;(e^^(^")|X)«|^exp(-a2(so,0D/2 + ^/i(so,X + 2^K^;0,*)), 

b 

where /i(so, Y, 6) and a'^{so, 6) extend the notation /i2|i and cjgii of the previ- 
ous section toy(so)|Y. lfgc{so,X.) = ^f,. exp(-cj2(so, 06)/2) cos(/i(so, X + 
27TKI- 61)) and gs{so,X) = ^Y:b* exp(-a2(so, 0fe*)/2) sin(/i(so, X+27rK*; 0^)), 
then the posterior mean kriged direction is 

(11) /i(so,X) = arctan*(5o,s(X),5ro,c(X)) 
and the associated posterior kriged concentration is 



(12) c(so, X) = ^ {g,{so, X))2 + {g,{so, X))l 

4. Examples and MCMC implementation. In Section 4.1 we offer some 
simulation examples, while in Section 4.2 we turn to the motivating spatial 
wave direction data. We fit the model described in Section 3.2 and imple- 
mented the kriging following Section 3.6. We note that MCMC model fitting 
as described in Section 3.4 is well behaved. 



4.1. Simulation examples. In the simulations we generate samples of size 
n = 100 from an unwrapped GP with constant mean n and we obtain the 
directional process by wrapping them onto the unit circle. Locations are 
generated uniformly using coordinates taken from the real data example of 
Section 4.2; we work with two different sample sizes by randomly choos- 
ing 30 and 70 sites for posterior estimation and the remaining are used for 
validation. We fix the covariance structure for the linear GP to be exponen- 
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tial with parameters (a'^jip). Here we report examples generated with ip = 
0.021 corresponding to a practical range of 142.86 km (maximum distance 
spanned by the coordinates is 290.13 km), /i = tt and three different vari- 
ances cr^ =0.1,0.5,1 corresponding to concentrations c = 0.951, 0.779, 0.606 
respectively. The prior for is a Gaussian distribution with zero mean and 
large variance (with an induced wrapped normal prior for jl). For cr^ we 
use informative inverse gamma distributions centered on the true value and 
variances 0.01, 0.06 and 0.07, respectively. For the decay parameter the prior 
is a uniform distribution in [0.001,1) when = 0.1,0.5 and in [0.001,0.5) 
when (T^ = 1. For all priors settings, several variance values were considered 
to assess behavior under strongly and weakly informative priors. The block 
sampling of cr^ and (p produces strongly autocorrelated chains for both pa- 
rameters and slow convergence compared to the independence case. We run 
the MCMC for 30,000 iterations, we discard the initial 6000 and we apply 
a thinning of 10, using 2400 samples for estimation. 

The precision of estimates is studied by 95% credible intervals, obtained 
as discussed in Section 2.2. We compute kriging estimates as in (11) and 
(12) for each specification and compute an average prediction error, defined 
as the average circular distance^ between the observed and kriged estimate, 
over the validation set of observed values. That is, for the validation set 
{^^(•s*),^ = l,2,...,m}, we compute ^ Xl^(l - cos(/u(sj, X) - 2;(s*))). We 
also compare this spatial interpolation to prediction obtained by fitting the 
independent wrapped normal model proposed in Section 2 and the prior 
structure described there. Comparison is assessed through average error, 
computed for the nonspatial model as the average circular distance between 
the directions in the validation set and the estimated mean direction of the 
nonspatial model. 

In Table 1 posterior estimates and interpolation errors are given for both 
the spatial and nonspatial models. Both models can recover the mean di- 
rection and concentration with wider confidence intervals when c is small. 
The decay parameter is correctly estimated when c = 0.951,0.779 for both 
sample sizes but requires larger sample size when c = 0.606. There is sub- 
stantial reduction in average prediction error using the wrapped GP model 
when there is spatial dependence, while for small ranges (not shown) the 
spatial model performs comparably to the nonspatial one. 

4.2. Wave direction data analysis. The majority of studies carried out 
on marine data are based on outputs from deterministic models, usually cli- 
matic forecasts computed at several spatial and temporal resolutions. Since 
the 1980s, deterministic models have been used for weather forecasting with 
increasing reliability. Moreover, in the last decade, sea surface wind data 



^We adopt as circular distance, d(a,/?) = 1 — cos(q — /3), as suggested in Jammala- 
madaka and SenGupta [(2001), page 16]. 
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Table 1 

Results from simulated data with = n and <j) = 0.013 for all simulations. Posterior 
mean and concentration point estimates are obtained averaging the MCMC samples, 
while the decay point estimate is obtained as modal value 
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projections, produced by meteorological models, have been found accurate 
enough to be taken as the basis for operational marine forecasts. Wave 
heights and outgoing wave directions, the latter being angular data mea- 
sured in degrees, are the main outputs of marine forecasts. The principal 
provider for global numerical wave forecasts in Europe is the European Cen- 
ter for Medium-Range Weather Forecasts (ECMWF), which runs at global 
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medium range (3-5 up to 10 day forecasts, 55 km spatial resolution) and at 
high resolution short term (3 days, 28 km resolution WAve Model, WAM) 
models in the Mediterranean Area. WAM outputs are given in deep waters 
(more than 100 m depth) on a grid with about 25 x 25 km cells. 

Calibration with buoy data for wave heights was studied [Jona Lasinio 
et al. (2007)] using a multistep model that included outgoing wave directions 
as a categorical variable (categorized according to an 8 sector windrose) 
not allowing for a full exploitation of the available information. The main 
use of these calibrated data is in starting of runs of shallow water forecast 
models [SWAN model, see, e.g., Holthuijsen (2007)]. The output of these 
models provide the basis for coastal alarm in severe weather and for the 
evaluation of coastal erosion. These models are built on grids with about 10 
km spatial resolution and they include as inputs calibrated and downscaled 
wave heights, bathymetry, wind speed and direction all given at the same 
spatial resolution. Wave directions are aligned to the finer grid simply by 
taking the circular mean of intersected WAM cells. Here we propose a fully 
model-based solution using a downscaling procedure. 

We analyze data from a single time during a storm in the Adriatic Sea. 
The data are outgoing wave directions produced by the WAM during the 
analysis, that is, a run of the forecast of the model at time t = 0. We use 45 
points from the WAM grid, as they cover a fairly homogeneous area during 
the storm movement. In Figure 2 the entire Adriatic network of buoys is 
shown together with the WAM grid and the estimation area (delimited by 
continuous lines). Available data yield a moments estimated sample con- 
centration of c = 0.8447, in the North-East direction (moments estimated 
sample mean direction fl = 0.5540). In the data only five values differ from 
7r/4 by more than 0.8 radians [they are marked on Figure 4(b)], four are 
located in the upper north part of the area outside the Gargano Peninsula, 
one is located near the coast inside the Mattinata Gulf. This last value, be- 
ing located in the curve of the gulf, may account for a local wave direction 
inversion. The other four seem to describe some turbulence in the wave field. 
We keep them in the data set to see how the spatial model deals with this 
local behavior. We are interested in downscaling these WAM values to a 
grid of 222 cells with 10 km resolution. 

Arguably, an angular data model such as a multivariate version of the 
von Mises distribution might be more natural here than a periodic data 
model like our wrapped normal. However, because the WN allows convenient 
extension to the spatial setting and because there is little practical difference 
between the WN and von Mises on the circle, we work with the latter, as 
detailed in Section 3. In the absence of covariates, we employ the constant 
mean direction model (though a trend surface could be investigated). In 
fitting, we run the MCMC algorithm for 30,000 iterations, with a burn- 
in of 6000, and compute kriging estimates as in (11) and (12) using 2400, 
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Fig. 2. WAM gnd, Italian Wave Momtonng network in the Adriatic Sea and estimation 
area, squares denote locations of buoys. 

taking one sample every 10 samples. We choose the following prior setting: 
f72~IG(9,4) (i.e., mode = 0.4 and variance = 0.04), ~ Unif (0.001, 0.1) 
and, for the mean, a Normal distribution with zero mean and large variance. 
In Figure 3 the posterior learning from the data is shown. Posterior estimates 
return a posterior mean direction of 0.971 radians ([0.471,1.814] is a 95% 
credible interval) and posterior concentration of 0.618 ([0.427,0.753] is 95% 
credible interval). The decay parameter, estimated as posterior modal value, 
is 0.023 ([0.017,0.096] is a 95% credible interval). In Figure 4 results from 
a leave-one-out validation procedure are shown. It can be seen that the 
five outliers are shrunk toward the 45° line. The average prediction error is 
0.0488. 

In Figure 5 an arrow plot of incoming wave directions with length of the 
arrows proportional to 1 — c (the longer the vector, the more variable the 
estimates at that grid point) is shown. In the arrow plot we choose to present 
the incoming direction^ instead of the outgoing one to better visualize the 
model impact of waves on the coast during storms. From Figure 5 we can 



^Incoming wave directions are obtained with a 180° rotation of the outgoing wave 
directions. 
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Fig. 3. Prior (dashed) and posterior (solid) distributions for the WAM data. 



see that the model captures local features of the wave field, that arrows 
rotate in the upper part of the plot (following the contour of the Gargano 
promontory and the presence of the outliers), and that more variable esti- 
mates are obtained near the coast line (mimicking similar behavior of wave 
heights before calibration). Finally, we compare our results with those from 
the nonspatial model (which we ran for 20,000 iterations, keeping the second 
half for estimation, using priors as described in Section 2.2). The estimated 
posterior mean direction is 0.5507 ([0.3278,0.7810] 95% C.I.) and the pos- 
terior concentration is 0.8272 ([0.7748,0.8667] 95% C.I.). The leave-one-out 
validation yielded an average predictive error of 0.1553, revealing that the 
spatial model yielded a reduction of 68%. 

5. Summary and future work. We have introduced a class of spatial pro- 
cess models which can be used to study circular data that is anticipated to 
have spatial structure. In particular, adopting the class of wrapped distri- 
butions for describing circular data, we have shown how a usual Gaussian 
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Fig. 4. (a) Kriged estimate vs. observed using leave-one-out validation, (b) locations of 
the 5 outliers present m the data (grey squares). 



process specification can be converted to a wrapped normal process spec- 
ification. We have demonstrated that fitting for such models can be done 
straightforwardly within a hierarchical modeling framework using MCMC 
methods with adaptive truncation. We have also shown how to implement 
kriging for such models. Our motivating application has revealed the pre- 
dictive benefit of the spatial modeling. 
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Fig. 5. Krigmg results for WAM data: wave directions represented as arrow, the length 
of the arrows is proportional to 1-concetration. Here we report incoming waves directions. 
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Particularly with directional data, one would expect there to be concerns 
regarding measurement error, for instance, with monitors recording wind 
direction and with buoys measuring wave direction. We are unaware of any 
measurement error work in the context of directional data but addressing 
it using wrapped Gaussian processes turns out to be straightforward. It 
will be captured by a nugget similar to the usual geostatistical modeling 
setting. In fact, a frequent interpretation of the nugget is measurement error 
[Banerjee, Carlin and Gelfand (2004)]. Suppose the observed angular data 
are collected with conditionally independent measurement error, that is, 
Xoj ~ WN(ytj-, r^), where Ytj = X^j + 27rKtj and Xtj is the true angular 
direction, with independence across j. As above, we introduce latent Kqj 
and model the joint distribution of Xoj,Koj as N{Xoj + 2TTKoj\Xtj,T'^). 
The latent true directions follow the wrapped GP of Section 3.2. That is, 

Xt = {Xt{si),Xt{s2),. . ■,Xt{sp)) ~ WN(^1, a^Km, 

where R{4')ij = p{si — sj; (p). Introducing K^, the joint distribution for , 
takes the form A^(X( + 27rK( (j^R(0)). The full model takes the form 

(13) n, [Xoj + 27rKo,j\Xt,, , r2] [X^ + 27rK< |^, a^K{^)] [fi] [a^] [r^] [0] . 

An issue that requires further investigation is the matter of the assump- 
tion of stationarity in the covariance function. It would be useful to de- 
velop diagnostics to examine this, paralleling those for linear spatial data. 
At present, all we can suggest is model comparison using average predictive 
error as in Section 4. 

Future work will investigate promise of extending the wrapped normal 
process to a wrapped i-process through the usual Gamma mixing that ex- 
tends a GP to a t-process [see, e.g., Zhang, Wu and Chang (2007)]. Exten- 
sion to the i-process discussed in Heyde and Leonenko (2005) would be more 
challenging. It will also lead us to incorporate dynamic structure into our 
modeling; with regard to our data set, we have wave direction information at 
various temporal resolutions. Finally, we will explore two data assimilation 
issues. The first is to fuse the angular data produced by the WAve Model 
(WAM) with the buoy data (RON) to improve our interpolation of wave 
direction. The second involves joint modeling of the wave direction data 
with the available associated wave height data. Hopefully, joint modeling 
will enable a version of co-kriging to improve the individual interpolations. 
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